Hierarchical Document Categorization Using Associative Networks
نویسندگان
چکیده
Associative networks are a connectionist language model with the ability to handle dynamic data. We used two associative networks to categorize random sets of related Wikipedia articles with only their raw text as input. We then compared the resulting categorization to a gold standard: the manual categorization by Wikipedia authors and used a neural network as a baseline. We also determined a human rating by having a group of judges rank the four categorization methods by correctness and by usefulness with regards to finding information. Based on these tests, we determined that associative networks produce results that are clearly better than the neural network baseline, coming close to the gold standard in terms of usefulness and correctness. Furthermore, automated testing suggests these results continue to hold for large datasets.
منابع مشابه
Document Categorization using Multilingual Associative Networks based on Wikipedia
Associative networks are a connectionist language model with the ability to categorize large sets of documents. In this research we combine monolingual associative networks based on Wikipedia to create a larger, multilingual associative network, using the cross-lingual connections between Wikipedia articles. We prove that such multilingual associative networks perform better than monolingual as...
متن کاملUsing Natural Language Processing to Improve Document Categorization with Associative Networks
Associative networks are a connectionist language model with the ability to handle large sets of documents. In this research we investigated the use of natural language processing techniques (part-of-speech tagging and parsing) in combination with Associative Networks for document categorization and compare the results to a TF-IDF baseline. By filtering out unwanted observations and preselectin...
متن کاملGrouping by association: using associative networks for document categorization
In this thesis we describe a method of using associative networks for automatic document grouping. Associative networks are networks of ideas or concepts in which each concept is linked to concepts that are semantically similar to it. By activating concepts in the network based on the text of a document and spreading this activation to related concepts, we can determine which concepts are relat...
متن کاملA Theoretical Framework for Web Categorization in Hierarchical Directories using Bayesian Networks
In this paper, we shall present a theoretical framework for classifying web pages in a hierarchical directory using the Bayesian Network formalism. In particular, we shall focus on the problem of multi-label text categorization, where a given document can be assigned to any number of categories in the hierarchy. The idea is to explicitly represent the dependence relationships between the differ...
متن کاملKeywords, k-NN and Neural Networks: a Support for Hierarchical Categorization of Texts in Brazilian Portuguese
A frequent problem in automatic categorization applications involving Portuguese language is the absence of large corpora of previously classified documents, which permit the validation of experiments carried out. Generally, the available corpora are not classified or, when they are, they contain a very reduced number of documents. The general goal of this study is to contribute to the developm...
متن کامل